A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs

نویسندگان

Prakash Raghavendra

Akshay Kumar Behki

K. Hariprasad

Madhav Mohan

Praveen Jain

Srivatsa S. Bhat

V. M. Thejus

Vishnumurthy Prabhu

چکیده

Today, the challenge is to exploit the parallelism available in the way of multi-core architectures by the software. This could be done by re-writing the application, by exploiting the hardware capabilities or expect the compiler/software runtime tools to do the job for us. With the advent of multi-core architectures ([1] [2]), this problem is becoming more and more relevant. Even today, there are not many run-time tools to analyze the behavioral pattern of such performance critical applications, and to re-compile them. So, techniques like OpenMP for shared memory programs are still useful in exploiting parallelism in the machine. This work tries to study if the loop parallelization (both with and without applying transformations) can be a good case for running scientific programs efficiently on such multi-core architectures. We have found the results to be encouraging and we strongly feel that this could lead to some good results if implemented fully in a production compiler for multi-core architectures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Sparse Matrix Vector Multiplication on SMPs

We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing ho...

متن کامل

Scalable Automatic Parallelization of Irregular Reductions on Shared Memory Multiprocessors

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scal-able shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further distributed across processors. Iterations belonging to the same set are chosen in such a way that update diierent entries in the reduction array. Tha...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Combining building blocks for parallel multi-level matrix multiplication

EXTENDED ABSTRACT Matrix-matrix multiplication is one of the core computations in many algorithms from scientific computing or numerical analysis and many efficient realizations have been invented over the years, including many parallel ones. The current trend to use clusters of PCs or SMPs for scientific computing suggests to revisit matrix-matrix multiplication and investigate efficiency and ...

متن کامل

Optimizing a multi-product closed-loop supply chain using NSGA-II, MOSA, and MOPSO meta-heuristic algorithms

This study aims to discuss the solution methodology for a closed-loop supply chain (CLSC) network that includes the collection of used products as well as distribution of the new products. This supply chain is presented on behalf of the problems that can be solved by the proposed meta-heuristic algorithms. A mathematical model is designed for a CLSC that involves three objective functions of ma...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs

نویسندگان

چکیده

منابع مشابه

Optimizing Sparse Matrix Vector Multiplication on SMPs

Scalable Automatic Parallelization of Irregular Reductions on Shared Memory Multiprocessors

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Combining building blocks for parallel multi-level matrix multiplication

Optimizing a multi-product closed-loop supply chain using NSGA-II, MOSA, and MOPSO meta-heuristic algorithms

عنوان ژورنال:

اشتراک گذاری